Resource Management for High_performance PC Clusters
نویسندگان
چکیده
With the recent availability of cost-e ective network cards for the PCI bus, researchers have been tempted to build up large compute clusters with standard PCs. Many of them are operated with workstation cluster management software in high-throughput or single user mode. For very large clusters with more than 100 PEs, however, it becomes necessary to implement a full edged resource management software that allows to partition the system for multi-user access. In this paper, we present our Computing Center Software (CCS), which was originally designed for managing massively parallel high-performance computers, and now adapted to modern workstation clusters. It provides { partitioning of exclusive and non-exclusive resources, { hardware-independent scheduling of interactive and batch jobs, { open, extensible interfaces to other resource management systems, { a high degree of reliability.
منابع مشابه
Resource Management and Scheduling on SupernodeII
In recent decades, the demands of using computer to solve grand and challenging problems grow both in size and in complexity. Distributed and parallel computing is thus important. Enabling technologies in highspeed communication today have made PC-based clusters become a mainstream of parallel and distributed platforms for high-performance, high-throughput and high-availability computing. To en...
متن کاملTopology-Aware Parallel Molecular Dynamics Simulation Algorithm
We have developed the topology-aware parallel molecular dynamics (TAPMD) algorithm, in which the processors are rearranged automatically according to resource topology so as to minimize the cost required for the simulation. It is demonstrated that TAPMD can reduce the communication time to less than half compared to the time in the worst case on a distributed PC clusters. This improvement is ex...
متن کاملManaging Multiple Multi-user PC Clusters
While PC clusters now provide the aggregate power of supercomputers of just a few years ago, they lack the integrated system management and administrative tools that are needed to make them easy to use. Simple tasks like rebooting or installing software can be quite difficult when the “computer” is rows of PC towers. This paper describes a Java-based tool suite called M3C (Managing Multiple Mul...
متن کاملCase Study: Setting up and running a production Linux cluster at Pacific Northwest National Laboratory
With the low price and increasing performance of commodity computer hardware, it is important to study the viability of using clusters of relatively inexpensive computers to produce a stable system, capable of the current demands for high performance massively parallel computing. A 192-processor cluster was installed to test and develop methods that would make the PC cluster a workable alternat...
متن کاملXtremWeb & Condor sharing resources between Internet connected Condor pools
Grid computing presents two major challenges for deploying large scale applications across wide area networks gathering volunteers PC and clusters/parallel computers as computational resources: security and fault tolerance. This paper presents a lightweight Grid solution for the deployment of multi-parameters applications on a set of clusters protected by firewalls. The system uses a hierarchic...
متن کامل